{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 6.3 Interacting with Web APIs (网络相关的API交互)\n",
    "\n",
    "很多网站都有公开的API，通过JSON等格式提供数据流。有很多方法可以访问这些API，这里推荐一个易用的requests包。\n",
    "\n",
    "找到github里pandas最新的30个issues，制作一个GET HTTP request, 通过使用requests包："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import requests"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "url = 'https://api.github.com/repos/pandas-dev/pandas/issues'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "resp = requests.get(url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "response的json方法能返回一个dict，包含可以解析为python object的JSON："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Optimize data type'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = resp.json()\n",
    "data[0]['title']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'assignee': None,\n",
       " 'assignees': [],\n",
       " 'author_association': 'NONE',\n",
       " 'body': 'Hi guys, i\\'m user of mysql\\r\\nwe have an \"function\" PROCEDURE ANALYSE\\r\\nhttps://dev.mysql.com/doc/refman/5.5/en/procedure-analyse.html\\r\\n\\r\\nit get all \"dataframe\" and show what\\'s the best \"dtype\", could we do something like it in Pandas?\\r\\n\\r\\nthanks!',\n",
       " 'closed_at': None,\n",
       " 'comments': 1,\n",
       " 'comments_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/18272/comments',\n",
       " 'created_at': '2017-11-13T22:51:32Z',\n",
       " 'events_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/18272/events',\n",
       " 'html_url': 'https://github.com/pandas-dev/pandas/issues/18272',\n",
       " 'id': 273606786,\n",
       " 'labels': [],\n",
       " 'labels_url': 'https://api.github.com/repos/pandas-dev/pandas/issues/18272/labels{/name}',\n",
       " 'locked': False,\n",
       " 'milestone': None,\n",
       " 'number': 18272,\n",
       " 'repository_url': 'https://api.github.com/repos/pandas-dev/pandas',\n",
       " 'state': 'open',\n",
       " 'title': 'Optimize data type',\n",
       " 'updated_at': '2017-11-13T22:57:27Z',\n",
       " 'url': 'https://api.github.com/repos/pandas-dev/pandas/issues/18272',\n",
       " 'user': {'avatar_url': 'https://avatars0.githubusercontent.com/u/2468782?v=4',\n",
       "  'events_url': 'https://api.github.com/users/rspadim/events{/privacy}',\n",
       "  'followers_url': 'https://api.github.com/users/rspadim/followers',\n",
       "  'following_url': 'https://api.github.com/users/rspadim/following{/other_user}',\n",
       "  'gists_url': 'https://api.github.com/users/rspadim/gists{/gist_id}',\n",
       "  'gravatar_id': '',\n",
       "  'html_url': 'https://github.com/rspadim',\n",
       "  'id': 2468782,\n",
       "  'login': 'rspadim',\n",
       "  'organizations_url': 'https://api.github.com/users/rspadim/orgs',\n",
       "  'received_events_url': 'https://api.github.com/users/rspadim/received_events',\n",
       "  'repos_url': 'https://api.github.com/users/rspadim/repos',\n",
       "  'site_admin': False,\n",
       "  'starred_url': 'https://api.github.com/users/rspadim/starred{/owner}{/repo}',\n",
       "  'subscriptions_url': 'https://api.github.com/users/rspadim/subscriptions',\n",
       "  'type': 'User',\n",
       "  'url': 'https://api.github.com/users/rspadim'}}"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "data中的每一个元素都是一个dict，这个dict就是在github上找到的issue页面上的信息。我们可以把data传给DataFrame并提取感兴趣的部分："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>number</th>\n",
       "      <th>title</th>\n",
       "      <th>labels</th>\n",
       "      <th>state</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>18272</td>\n",
       "      <td>Optimize data type</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>18271</td>\n",
       "      <td>BUG: Series.rank(pct=True).max() != 1 for a la...</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>18270</td>\n",
       "      <td>(Series|DataFrame) datetimelike ops</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>18268</td>\n",
       "      <td>DOC: update Series.combine/DataFrame.combine d...</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>18266</td>\n",
       "      <td>DOC: updated .combine_first doc strings</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>18265</td>\n",
       "      <td>Calling DataFrame.stack on an out-of-order col...</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>18264</td>\n",
       "      <td>cleaned up imports</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>18263</td>\n",
       "      <td>Tslibs offsets paramd</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18262</td>\n",
       "      <td>DEPR: let's deprecate</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18258</td>\n",
       "      <td>DEPR: deprecate (Sparse)Series.from_array</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>18255</td>\n",
       "      <td>ENH/PERF: Add cache='infer' to to_datetime</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>18250</td>\n",
       "      <td>Categorical.replace() unexpectedly returns non...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>18246</td>\n",
       "      <td>pandas.MultiIndex.reorder_levels has no inplac...</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>18245</td>\n",
       "      <td>TST: test tz-aware DatetimeIndex as separate m...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>18244</td>\n",
       "      <td>RLS 0.21.1</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>18243</td>\n",
       "      <td>DEPR: deprecate .ftypes, get_ftype_counts</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>18242</td>\n",
       "      <td>CLN: Remove days, seconds and microseconds pro...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>18241</td>\n",
       "      <td>DEPS: drop 2.7 support</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>18238</td>\n",
       "      <td>BUG: Fix filter method so that accepts byte an...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>18237</td>\n",
       "      <td>Deprecate Series.asobject, Index.asobject, ren...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>18236</td>\n",
       "      <td>df.plot() very slow compared to explicit matpl...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>18235</td>\n",
       "      <td>Quarter.onOffset looks fishy</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>18231</td>\n",
       "      <td>Reduce copying of input data on Series constru...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>18226</td>\n",
       "      <td>Patch __init__ to prevent passing invalid kwds</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>18222</td>\n",
       "      <td>DataFrame.plot() produces incorrect legend lab...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>18220</td>\n",
       "      <td>DataFrame.groupy renames columns when given a ...</td>\n",
       "      <td>[]</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>18217</td>\n",
       "      <td>Deprecate Index.summary</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>18216</td>\n",
       "      <td>Pass kwargs from read_parquet() to the underly...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>18215</td>\n",
       "      <td>DOC/DEPR: ensure that @deprecated functions ha...</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>18213</td>\n",
       "      <td>Deprecate Series.from_array ?</td>\n",
       "      <td>[{'url': 'https://api.github.com/repos/pandas-...</td>\n",
       "      <td>open</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    number                                              title  \\\n",
       "0    18272                                 Optimize data type   \n",
       "1    18271  BUG: Series.rank(pct=True).max() != 1 for a la...   \n",
       "2    18270                (Series|DataFrame) datetimelike ops   \n",
       "3    18268  DOC: update Series.combine/DataFrame.combine d...   \n",
       "4    18266            DOC: updated .combine_first doc strings   \n",
       "5    18265  Calling DataFrame.stack on an out-of-order col...   \n",
       "6    18264                                 cleaned up imports   \n",
       "7    18263                              Tslibs offsets paramd   \n",
       "8    18262                              DEPR: let's deprecate   \n",
       "9    18258          DEPR: deprecate (Sparse)Series.from_array   \n",
       "10   18255         ENH/PERF: Add cache='infer' to to_datetime   \n",
       "11   18250  Categorical.replace() unexpectedly returns non...   \n",
       "12   18246  pandas.MultiIndex.reorder_levels has no inplac...   \n",
       "13   18245  TST: test tz-aware DatetimeIndex as separate m...   \n",
       "14   18244                                         RLS 0.21.1   \n",
       "15   18243          DEPR: deprecate .ftypes, get_ftype_counts   \n",
       "16   18242  CLN: Remove days, seconds and microseconds pro...   \n",
       "17   18241                             DEPS: drop 2.7 support   \n",
       "18   18238  BUG: Fix filter method so that accepts byte an...   \n",
       "19   18237  Deprecate Series.asobject, Index.asobject, ren...   \n",
       "20   18236  df.plot() very slow compared to explicit matpl...   \n",
       "21   18235                       Quarter.onOffset looks fishy   \n",
       "22   18231  Reduce copying of input data on Series constru...   \n",
       "23   18226     Patch __init__ to prevent passing invalid kwds   \n",
       "24   18222  DataFrame.plot() produces incorrect legend lab...   \n",
       "25   18220  DataFrame.groupy renames columns when given a ...   \n",
       "26   18217                            Deprecate Index.summary   \n",
       "27   18216  Pass kwargs from read_parquet() to the underly...   \n",
       "28   18215  DOC/DEPR: ensure that @deprecated functions ha...   \n",
       "29   18213                      Deprecate Series.from_array ?   \n",
       "\n",
       "                                               labels state  \n",
       "0                                                  []  open  \n",
       "1                                                  []  open  \n",
       "2                                                  []  open  \n",
       "3                                                  []  open  \n",
       "4   [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "5                                                  []  open  \n",
       "6   [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "7                                                  []  open  \n",
       "8   [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "9   [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "10  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "11  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "12                                                 []  open  \n",
       "13  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "14  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "15  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "16  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "17  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "18  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "19  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "20  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "21                                                 []  open  \n",
       "22  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "23  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "24  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "25                                                 []  open  \n",
       "26  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "27  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "28  [{'url': 'https://api.github.com/repos/pandas-...  open  \n",
       "29  [{'url': 'https://api.github.com/repos/pandas-...  open  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "issues = pd.DataFrame(data, columns=['number', 'title', \n",
    "                                    'labels', 'state'])\n",
    "issues"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过一些体力活，我们可以构建一些高层级的界面，让web API直接返回DataFrame格式，以便于分析。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [py35]",
   "language": "python",
   "name": "Python [py35]"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
